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The  research  performed  under  this  grant  centers  on  the  concept  of  a  network*-1-^1— - 

computer.  By  this  we  mean  a  network  of  computers  (no  shared  memory)  wliii  liility  Codee 
can  be  programmed  as  if  it  were  a  single  virtual  machine  using  a  high  level  distri-i11  and/°r 
buted  language.  Work  during  this  past  year  can  be  divided  into  three  areas.  'Pecial 

A.  Distributed  Algorithms  W'/l 

A  number  of  distributed  algorithms  have  been  described  in  the  literature  tor 
network  computers.  We  are  particularly  interested  in  low  level  algorithms  which, 
for  example,  might  be  used  in  a  distributed  operating  system  to  support  resource 
allocation  or  enhance  reliability.  The  Arpanet  routing  algorithm  [MW77]  is  an 
early  example.  More  recently  attention  has  been  given  to  problems  related  to  the 
implementation  of  distributed  databases.  Distributed  algorithms  for  guaranteeing 
the  consistency  of  such  databases  under  concurrent  operation  and  in  the  face  of 
component  failures  have  been  developed  (e.g.,  [Gr78],  [BG81].  [Th78],  [Bo82], 

[BL82]).  Other  classes  of  algorithms  deal  with  distributed  deadlock  detection  (e.g. 

[MM70J.  [CM82]).  load  balancing  in  a  distributed  system  (e.g.  [BF81],  [HW80], 

[Sh83] )  and  resource  allocation  (e.g.  (ADD82],  [Sm79]).  Finally,  algorithms 
related  to  techniques  for  organizing  a  distributed  system  such  as  the  election  of  a 
leader  [Ga82]  and  the  enforcement  of  distributed  synchronization  [Sc82]  fall  in 
this  category. 

Algorithms  of  this  type  may  involve  one  or  more  of  the  following  features: 
replication  of  information,  redundant  computation,  resiliency  in  the  face  of  incon¬ 
sistent  information,  communication  failures  or  node  failures.  These  algorithms 
are  generally  characterized  by  a  rather  high  level  of  message  type  communication 
between  the  distributed  processes.  Processes  generally  do  not  wait  for  a  response 
immediately  after  sending  a  message  and  in  many  cases  there  is  no  response  at 
all.  Furthermore,  communication  is  frequently  of  a  multicast  or  broadcast 
nature.  In  (CM82)  each  node  relays  a  probe  which  it  has  received  to  a  dynami¬ 
cally  determined  subset  of  other  nodes.  It  does  not  reply  to  the  sender.  In 
[Ga82]  a  potential  coordinator  multicasts  a  message  to  all  higher  priority  nodes 
and  then  awaits  responses.  In  an  implementation  of  two  phase  commit,  the  com¬ 
mit  coordinator  multicasts  messages  to  all  cohorts  in  a  similar  fashion. 

Another  aspect  of  communication  in  distributed  algorithms  is  that  a  message 
need  not  always  be  addressed  to  a  unique  process;  any  one  of  a  set  of  processes 
may  be  eligible  to  receive  it.  Thus  for  reasons  of  reliability,  load  sharing  or  to 
reduce  the  lengths  of  communication  paths,  duplicate  servers  may  be  distributed 
throughout  a  network.  In  the  Pup  internet  [B<>82]  name  servers  arc  duplicated  at 
each  gateway  since  there  is  at  least  one  gateway  on  each  net  and  it  is  up  mosi  of 
the  time.  In  general,  clients  do  not  care  which  member  of  a  set  of  identical 
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Our  major  interest  in  this  area  lias  been  to  study  the  properties  <>f  a  ' 

language  suitable  for  programming  this  class  of  algorithms.  This  work  will  be 
described  in  the  next  section.  As  an  outgrowth  of  the  study,  however,  we  have 
been  examining  some  specific  algorithms.  In  particular  we  have  developed  a  dis¬ 
tributed  stable  storage  algorithm  in  which  copies  of  a  replicated  database  are  dis¬ 
tributed  over  nodes  connected  to  a  broadcast  medium.  This  is  a  generalization  of 
the  work  by  Lampson  [La81]  on  stable  storage.  In  that  model  data  is  duplicated 
on  independent  storage  devices  at  a  single  node  and  failures  are  categorized  into 
those  which  are  expected  and  those  which  are  unexpected.  The  latter  are  disas¬ 
ters  for  which  stable  storage  offers  no  protection.  Algorithms  are  provided  for 
failures  in  the  former  category  which  guarantee  that  data  is  preserved.  Examples 
of  expected  failures  are  processor  crashes  (i.e.  the  processor  is  reset  to  some  stan¬ 
dard  state),  transient  I/O  errors,  and  a  limited,  spontaneous  decay  of  information 
on  mass  storage.  An  example  of  an  unexpected  failure  is  the  malicious  behavior 
of  a  processor.  There  are  two  disadvantages  of  this  approach.  Although  data 
may  be  preserved  at  a  node  where  a  processor  crash  has  occurred,  it  will  not  be 
available  during  the  time  that  the  processor  is  down.  Secondly,  malfunctions  of 
the  processor  other  than  simple  crashes  can  destroy  the  data. 


The  algorithm  we  have  developed  is  designed  specifically  for  data  reliability 
in  a  broadcast  environment.  It  is  loosely  coupled  in  the  sense  that  it  is  designed 
to  function  despite  the  fact  that  not  ail  copies  of  the  data  need  agree  and  not  all 
processors  may  be  functioning  correctly.  A  significant  aspect  of  the  proposal  is 
that  if  no  errors  occur  the  redundancy  is  largely  transparent  to  the  procedures  for 
accessing  the  data,  requiring  no  extra  communication.  Additional  messages  are 
required  when  errors  are  detected  in  copies  of  stored  information.  The  system 
can  handle  a  much  wider  class  of  errors  than  that  described  in  [La8l].  A 
significant  part  of  the  work  is  the  development  of  a  Markov  model  to  describe  the 
failure  behavior  of  the  system.  The  model  relates  various  parameters  of  the  algo¬ 
rithm  to  the  mean  time  to  data  loss.  A  paper  describing  this  work  has  been  com¬ 
pleted  [Be83]  and  submitted  for  publication. 


Other  distributed  algorithms  which  are  being  studied  are  the  solution  to  the 
Byzantine  Generals  Problem  [Do82]  and  piotocols  for  updating  multiple  copy 
data  bases  where  serial  consistency  is  not  required  [FM82],  In  the  former  case  we 
are  examining  the  effects  of  communication  failure  as  opposed  to  processor  failure 
and  have  formulated  a  weaker  requirement  for  agreement.  In  the  latter  case  we 
have  developed  a  protocol  for  data  base  update  with  reduced  communication 
requirements.  This  research  is  being  carried  out  by  Mr.  Gene  Wuu,  a  Phi)  stu¬ 
dent.  It  is  still  at  an  early  stage. 


B.  Distributed  Languages 

This  work  is  a  continuation  of  the  work  performed  in  the  previous  year  to 
develop  a  distributed  language  for  the  category  of  algorithms  described  in  the 
previous  section.  That  work  culminated  in  the  presentation  of  two  papers  during 


this  past  year,  one  of  which  outlined  the  communication  aspects  of  a  distributed 
language  [GB82],  and  the  other  a  new  switching  technique  for  transporting  mes¬ 
sages  in  a  local  area  network  [AGB083].  Two  students  working  in  this  area  com¬ 
pleted  their  degree  requirements  during  this  period:  Dr.  David  Gelernter  received 
a  PhD  and  is  currently  teaching  in  the  Computer  Science  Department  at  Yale 
and  Mr.  Mauricio  Arango  received  an  MS  and  is  currently  working  in  industry. 

A  number  of  distributed  languages  have  been  described  in  the  literature  for 
implementing  distributed  algorithms.  None,  however,  are  oriented  towards  the 
type  of  applications  described  above.  Some  are  transaction  oriented  and  rely  on 
remote  procedure  call  as  a  means  of  communication  [LS81],  [Br78]  (Ada  [DOD80) 
also  falls  into  this  category).  Others  are  more  flexible  in  that  asynchronous  mes¬ 
sage  passing  is  provided  as  well  (SY83],  [Co79],  [An81] .  Others  are  primarily  mes¬ 
sage  oriented  [Li79],  [Ho78],  [Fe79].  None,  however,  support  multicast  communi¬ 
cation  or  allow  a  generalized  addressing  scheme  in  which  any  one  of  a  set  of 
processes  -  which  might  be  distributed  through  the  net  -  can  be  the  recipient  of  a 
message. 

Our  work  differs  from  that  of  other  proposals  in  this  area  in  that  instead  of 
addressing  a  message  to  a  particular  process,  a  message  is  addressed  to  a  name 
which  is  visible  in  some  region  of  a  distributed  program  containing  both  the 
sender  and  the  intended  receiver(s).  If  the  message  is  sent  in  unicast  mode  then 
any  process  within  the  region  is  eligible  to  receive  it;  if  it  is  sent  in  multicast 
mode  then  multiple  processes  in  the  region  may  copy  the  message.  Thus  name 
based  addressing  naturally  integrates  both  concepts.  Messages  may  be  deleted 
when  they  are  no  longer  relevant.  For  example,  in  the  contract  net  protocol 
(Sm79]  outstanding  messages  requesting  bids  should  be  deleted  when  the  bid 
period  is  over. 

This  project  is  now  continuing  under  the  direction  of  a  new  PhD  student. 
Mr.  Mustaque  Ahamad.  Our  initial  work  consisted  of  a  refinement  of  Dr. 
Gelernter's  proposal  based  on  an  examination  of  distributed  algorithms  taken 
from  the  literature.  In  particular,  communication  statements  dealing  with  multi¬ 
cast  have  been  modified,  language  structures  for  dynamically  establishing  com¬ 
munication  paths  have  been  developed  and  an  implementation  schema  suitable 
for  a  general  communication  environment  has  been  designed.  A  syntax  has  been 
developed  which  includes  exception  handling.  A  technical  report  on  this  subject 
will  be  released  shortly. 

One  aspect  of  this  work  is  the  development  of  formal  semantics  for  the  com¬ 
munication  constructs  of  the  language.  We  are  preparing  to  extend  the  work 
described  in  [SS82]  to  deal  with  name  based  addressing,  multicast  and  message 
withdrawal.  Initial  investigations  indicate  that  this  will  bear  some  relationship  to 
the  work  on  temporal  logic  which  was  completed  this  past  year  under  grant  sup¬ 
port.  Dr.  Paul  Harter  completed  the  requirements  for  a  PhD  in  December.  1982 
with  a  thesis  in  this  area  and  is  currently  teaching  in  the  Computer  Science 
Department  at  the  University  of  Colorado. 


C.  An  Implementation  of  Multicasting  on  a  Network  Computer 

A  project  has  been  in  progress  for  the  past  year  to  design  (and  ultimately 
implement)  multicast  communication  on  a  network  computer.  This  work  is  being 
done  in  collaboration  with  Prof.  Larry  Wit  tie  and  one  of  his  students,  Mr.  Ariel 
Frank.  The  target  system  consists  of  some  Motorola  08000  computers  (a  few  of 
which  are  actually  SUN  workstations)  connected  by  ct hornet  pathways.  Some 
grant  equipment  money  was  used  for  this  during  the  past  year.  Additional  items 
were  purchased  with  funds  from  an  NSF  equipment  grant.  That  award  was  par¬ 
tially  based  on  results  achieved  with  AFOSR  support. 

The  problem  is  to  design  a  communications  kernel  for  each  node  in  the  net¬ 
work  which  will  support  multicast  addressing.  This  is  a  generalization  of  the 
directed  broadcast  scheme  of  (Bo82).  The  goal  is  to  assure  that  a  multicast  mes¬ 
sage  is  phy  sically  broadcast  on  a  subset  of  ethernets  which  covers  all  nodes  con¬ 
taining  potential  receivers.  Implementation  will  be  based  on  the  multicast 
addressing  provided  in  ethernet  controllers.  Each  controller  responds  to  a  set  of 
logical  addresses  whose  membership  may  be  dynamically  changed.  This  work 
meshes  closely  with  the  language  project  described  in  the  previous  section. 
Names  will  be  mapped  into  logical  addresses.  The  contents  of  the  set  at  a  partic¬ 
ular  node  will  correspond  to  the  set  of  names  being  used  by  modules  allocated  to 
that  node.  Implementation  of  the  multicasting  structure  is  being  do'ne  in 
Modula-2.  Successful  completion  of  this  work  will  yield  an  environment  which 
will  support  a  distributed  language  realized  as  an  extension  to  Modula-2. 


No  patents  have  been  requested  on  this  research. 
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